feat: taint checking and security #197

ambrishrawat · 2025-10-14T19:21:17Z

This PR introduces a minimal proof-of-concept for taint and security propagation across CBlock, ModelOutputThunk, and session flows, as discussed in generative-computing/mellea#189
.

mergify · 2025-10-14T19:21:52Z

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert|release)(?:\(.+\))?:

ambrishrawat · 2025-10-17T14:15:05Z

@nrfulton quick clarifications -

What’s the best way for expose taint configuration to devs? e.g. when a description includes a user variable like summarise the following {{email_body}}, should taint be inferred automatically or something they can configure?
Would it make sense to have a global strictness setting to toggle between warnings and exceptions for taint violations? Is blocify the best place for this?

nrfulton · 2025-10-22T19:55:22Z

What’s the best way for expose taint configuration to devs? e.g. when a description includes a user variable like summarise the following {{email_body}}, should taint be inferred automatically or something they can configure?

We should infer automatically where-ever possible. I nthis case, I'm not sure how you would infer taint. I guess you assumption here is that email_boy -- or any user_variable input -- should entail taint?

ambrishrawat · 2025-10-23T12:47:18Z

Yes, that was the thinking. Making it configurable may make more sense for taint. Any thoughts on the best way to expose that? Would love your take on the code too.

davidcox · 2025-10-23T13:24:53Z

If there is a tainted variable in the context, everything downstream should get tainted. As for how variables get tainted in the first place, a common way people do this is to define sources, sinks, and (optionally) washers. These are wrappers around interfaces that produce sensitive data (e.g. HR database api), or where it enters an unsafe place (e.g. sending to a UI).

Signed-off-by: Ambrish Rawat <[email protected]>

nrfulton · 2025-11-21T01:18:11Z

docs/dev/taint_analysis.md

+component = CBlock("user input")
+component.mark_tainted()  # Sets SecLevel.tainted_by(component)


CBlocks should be immutable.

Naming a variable Ccmponent and assigning it to a CBlock is confusing.

The cyclic reference here is a bit confusing and invites buggy code. Use tainted_by(None) instead of tainted_by(self) for the root node.

c = CBlock("user input", sec_level=SecLevel.tained_by(None))

Updated this; tainted_by(None) for root now

nrfulton · 2025-11-21T01:19:11Z

docs/dev/taint_analysis.md

+component = CBlock("user input")
+component.mark_tainted()  # Sets SecLevel.tainted_by(component)
+
+if component._meta["_security"].is_tainted():


Why not c.sec_level?

Defined it as a property and now this works as c.sec_level.is_tainted()

nrfulton · 2025-11-21T01:20:25Z

docs/examples/security/taint_example.py

+print(f"Original CBlock is tainted: {not tainted_desc.is_safe()}")
+
+# Create session
+session = MelleaSession(OllamaModelBackend("llama3.2"))


Unless the example critically depends on using a particular model, always use session = start_session() instead. This makes the examples easier to maintain.

nrfulton · 2025-11-21T01:21:07Z

docs/examples/security/taint_example.py

+
+# The result should be tainted
+print(f"Result is tainted: {not result.is_safe()}")
+if not result.is_safe():


We should use is_tainted instead of is_safe. The meaning of safe is very ambiguous.

Removed all instances of is_safe

nrfulton · 2025-11-21T01:22:19Z

mellea/security/core.py

+        Returns:
+            The CBlock or Component that tainted this content, or None
+        """
+        if self.level_type == "tainted_by":


Especially in a module called security.core, we should avoid use of magic strings.

Created SecLevelType enum

nrfulton · 2025-11-21T01:24:24Z

mellea/security/core.py

+            sources.append(action)
+
+    # For Components, check their constituent parts for taint
+    if hasattr(action, 'parts'):


Instead us something like:

match action: case Component... case CBlock...

(If type(action) :> Component then check is not necessary because the Component protocol has a parts() method. )

Updated it to use match/case

Signed-off-by: Ambrish Rawat <[email protected]>

ambrishrawat · 2025-11-25T13:27:04Z

Thanks for the review @nrfulton !
I have incorporated your suggestions. Appreciate another pass when you get the chance

Signed-off-by: Ambrish Rawat <[email protected]>

guicho271828 · 2025-11-26T15:50:04Z

hi ambrish!

Signed-off-by: Ambrish Rawat <[email protected]>

nrfulton · 2025-12-10T17:36:16Z

Thanks for the changes; looks good. I'll run the workflows and we'll merge before the last release of the year (which will probably be next Wednesday)

ambrishrawat · 2025-12-21T06:38:48Z

Great, would be good to get this in.
I am interested in running some quantitative benchmarks once it's merged.

Signed-off-by: Ambrish Rawat <[email protected]>

nrfulton

There are some issues with the way that taint information is stored.

Proposal:

introduce a TaintAnalysis Protocol with sec_metadata() and/or sec_level() methods.
CBlock, Component, and ModelOutputThunk should implement the TaintAnalysis protocol. Use private fields on each to do this, but access those fields through the protocol methods.

After doing this, it will be easier to fix the soundness bug in security/core.py.

I'm also not a huge fan of the local imports as a work-around to circular imports. But we can merge that in for now and I'll clean it up in our refactor, which will probably move these files around anyways.

nrfulton · 2026-01-09T17:12:27Z

mellea/stdlib/base.py

+        # Add security metadata based on taint sources
+        from mellea.security import SecLevel, SecurityMetadata


Is this to avoid circular imports?

nrfulton · 2026-01-09T17:13:44Z

mellea/stdlib/session.py

    def instruct(
        self,
-        description: str,
+        description: str | CBlock,


@jakelorocco any objections?

nrfulton · 2026-01-09T17:14:07Z

test/stdlib_basics/test_security_comprehensive.py

+
+    def test_sec_level_none(self):
+        """Test SecLevel.none() creates safe level."""
+        from mellea.security.core import SecLevelType


move import to top level.

nrfulton · 2026-01-09T17:15:38Z

mellea/security/core.py

+            except Exception:
+                # If parts() fails, continue without it
+                pass


Remove this block. The parts() methods are now fully implemented as of v. 0.2.3.

nrfulton · 2026-01-09T17:17:33Z

mellea/stdlib/base.py

        self,
        value: str | None,
        meta: dict[str, Any] | None = None,
+        sec_level: Any = None,


Should this be sec_level : SecLevel | None?

nrfulton · 2026-01-09T17:20:41Z

mellea/stdlib/base.py

+        if sec_level is not None:
+            from mellea.security import SecurityMetadata
+
+            self._meta["_security"] = SecurityMetadata(sec_level)


Why not self.sec_metadata = SecurityMetadata(sec_level)?

nrfulton · 2026-01-09T17:23:21Z

mellea/security/core.py

+    # Check if action has security metadata and is tainted
+    if hasattr(action, "_meta") and "_security" in action._meta:
+        security_meta = action._meta["_security"]
+        if isinstance(security_meta, SecurityMetadata) and security_meta.is_tainted():
+            sources.append(action)


This use of hasattr feels like a code smell.

nrfulton · 2026-01-09T17:24:51Z

mellea/security/core.py

+                for part in parts:
+                    if hasattr(part, "_meta") and "_security" in part._meta:
+                        security_meta = part._meta["_security"]
+                        if (
+                            isinstance(security_meta, SecurityMetadata)
+                            and security_meta.is_tainted()
+                        ):
+                            sources.append(part)


I think this is unsound. Counter-example: consider any case where c.parts() contains elements of type Component and that component has cblocks with tainted data.

ambrishrawat marked this pull request as draft October 14, 2025 19:21

nrfulton self-requested a review October 15, 2025 16:54

ambrishrawat force-pushed the security_poc branch from 4798f64 to d1b09e1 Compare November 11, 2025 19:24

ambrishrawat marked this pull request as ready for review November 12, 2025 11:11

ambrishrawat added 9 commits November 14, 2025 15:34

version 2 of taint tracking

7d4e73e

Signed-off-by: Ambrish Rawat <[email protected]>

version 2 of taint tracking

4c9ab68

Signed-off-by: Ambrish Rawat <[email protected]>

taint tracking updates for ollama and litellm

daaf5ee

Signed-off-by: Ambrish Rawat <[email protected]>

docs taint analysis

2fd7bc0

Signed-off-by: Ambrish Rawat <[email protected]>

restored formatter to original

0efbee6

Signed-off-by: Ambrish Rawat <[email protected]>

removed redundant sanitise helper

5e1a266

Signed-off-by: Ambrish Rawat <[email protected]>

updated taint analysis dev docs

4ea932a

Signed-off-by: Ambrish Rawat <[email protected]>

updated taint analysis dev docs

b0a23a5

Signed-off-by: Ambrish Rawat <[email protected]>

updated taint analysis dev docs

5466c31

Signed-off-by: Ambrish Rawat <[email protected]>

ambrishrawat force-pushed the security_poc branch from d5f8033 to 5466c31 Compare November 14, 2025 15:35

ambrishrawat and others added 3 commits November 18, 2025 16:01

Merge branch 'generative-computing:main' into security_poc

5f85458

Merge branch 'generative-computing:main' into security_poc

0075de9

Merge branch 'main' into security_poc

7876e62

nrfulton requested changes Nov 21, 2025

View reviewed changes

ambrishrawat added 2 commits November 25, 2025 12:17

updates based on PR feedback

c303b87

Signed-off-by: Ambrish Rawat <[email protected]>

added tests for taint_sources of a Component

f85f953

Signed-off-by: Ambrish Rawat <[email protected]>

minor doc updates to remove the use of word safe

c1062e5

Signed-off-by: Ambrish Rawat <[email protected]>

Merge branch 'main' into security_poc

13a853e

nrfulton self-requested a review December 2, 2025 02:15

ambrishrawat and others added 4 commits December 2, 2025 13:34

fixing the linter error in parts

f29d529

Signed-off-by: Ambrish Rawat <[email protected]>

fixing the linter errors from ruff

fdafe4e

Signed-off-by: Ambrish Rawat <[email protected]>

Merge branch 'generative-computing:main' into security_poc

ef71608

Merge branch 'generative-computing:main' into security_poc

10864bd

nrfulton and others added 2 commits December 12, 2025 11:54

Merge branch 'main' into security_poc

e495469

Merge branch 'generative-computing:main' into security_poc

55aac37

ambrishrawat and others added 5 commits January 8, 2026 22:16

rebasing and fixing merge conflicts

5be6bc5

Signed-off-by: Ambrish Rawat <[email protected]>

added taint tracking to all backends

3911adb

Signed-off-by: Ambrish Rawat <[email protected]>

updated taint tracking dev doc

f0e4f89

Signed-off-by: Ambrish Rawat <[email protected]>

Merge branch 'main' into security_poc

6a4afbd

keeping the original signature of parts in instruction.py

85ddcbd

Signed-off-by: Ambrish Rawat <[email protected]>

nrfulton requested changes Jan 9, 2026

View reviewed changes

		component = CBlock("user input")
		component.mark_tainted() # Sets SecLevel.tainted_by(component)

		# Add security metadata based on taint sources
		from mellea.security import SecLevel, SecurityMetadata

feat: taint checking and security #197

Are you sure you want to change the base?

feat: taint checking and security #197

Conversation

ambrishrawat commented Oct 14, 2025

Uh oh!

mergify bot commented Oct 14, 2025

Merge Protections

🟢 Enforce conventional commit

Uh oh!

ambrishrawat commented Oct 17, 2025

Uh oh!

nrfulton commented Oct 22, 2025

Uh oh!

ambrishrawat commented Oct 23, 2025

Uh oh!

davidcox commented Oct 23, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ambrishrawat commented Nov 25, 2025

Uh oh!

guicho271828 commented Nov 26, 2025

Uh oh!

nrfulton commented Dec 10, 2025

Uh oh!

ambrishrawat commented Dec 21, 2025

Uh oh!

nrfulton left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants